Linear Algorithm for Conservative Degenerate Pattern Matching

نویسندگان

  • Maxime Crochemore
  • Costas S. Iliopoulos
  • Ritu Kundu
  • Manal Mohamed
  • Fatima Vayani
چکیده

A degenerate symbol x̃ over an alphabet Σ is a non-empty subset of Σ, and a sequence of such symbols is a degenerate string. A degenerate string is said to be conservative if its number of non-solid symbols is upper-bounded by a fixed positive constant k. We consider here the matching problem of conservative degenerate strings and present the first linear-time algorithm that can find, for given degenerate strings P̃ and T̃ of total length n containing k non-solid symbols in total, the occurrences of P̃ in T̃ in O(nk) time.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Approach to Pattern Matching in Degenerate DNA/RNA Sequences and Distributed Pattern Matching

In this paper, we consider the pattern matching problem in DNA and RNA sequences where either the pattern or the text can be degenerate i.e. contain sets of characters. We present an asymptotically faster algorithm for the above problem that works in O(n logm) time, where n and m is the length of the text and the pattern respectively. We also suggest an efficient implementation of our algorithm...

متن کامل

Efficient Pattern Matching in Elastic-Degenerate Strings

In this paper, we extend the notion of gapped strings to elastic-degenerate strings. An elastic-degenerate string can been seen as an ordered collection of k > 1 seeds (substrings/subpatterns) interleaved by elastic-degenerate symbols such that each elastic-degenerate symbol corresponds to a set of two or more variable length strings. Here, we present an algorithm for solving the pattern matchi...

متن کامل

Pattern Matching in Degenerate DNA/RNA Sequences

In this paper, we consider the pattern matching problem in DNA and RNA sequences where either the pattern or the text can be degenerate i.e. contain sets of characters. We present an asymptotically faster algorithm for the above problem that works in O(n logm) time, where n and m is the length of the text and the pattern respectively. We also suggest an efficient implementation of our algorithm...

متن کامل

Efficient pattern matching in degenerate strings with the Burrows-Wheeler transform

A degenerate or indeterminate string on an alphabet Σ is a sequence of non-empty subsets of Σ. Given a degenerate string t of length n, we present a new method based on the Burrows–Wheeler transform for searching for a degenerate pattern of length m in t running in O(mn) time on a constant size alphabet Σ. Furthermore, it is a hybrid patternmatching technique that works on both regular and dege...

متن کامل

Parallel Algorithms for Degenerate and Weighted Sequences Derived from High Throughput Sequencing Technologies

Novel high throughput sequencing technologies have redefined the way genome sequencing is performed. They are able to produce millions of short sequences in a single experiment and with a much lower cost than previous methods. In this paper, we address the problem of efficiently mapping and classifying millions of degenerate and weighted sequences to a reference genome, based on whether they oc...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Eng. Appl. of AI

دوره 51  شماره 

صفحات  -

تاریخ انتشار 2016